Chapter 1: Descriptive statistics

2 Steps to Data Analysis:

  1. What is the research question?

  2. What properties of the variables of primary interest?

2 Types of variables:

  • Categorical \(\rightarrow \) Responses can be sorted into a finite set of unordered categories
  • Quantitative \(\rightarrow \) Responses are measured on some sort of scale

Summary of descriptive methods

Categorical Data

Problems that summarise one categorical variable and the association between two categorical variables are extremely similar in scope so we’ll cover both here.

Numerical summaries of cateogircal data

The main tool is a table of frequencies (both one way for a single variable and two way for two variables)

One way table:

Party Liberal Labor
300 295

Two way table:

Survived Died
Male 142 709
Female 308 154

Graphical summaries of categorical data

2 types:

  • Bar chart of frequencies \(\rightarrow\) 1 var

  • Clustered bar chart (of frequencies) \(\rightarrow\) 2 vars

Barchart of frequencies and Clustered bar chart

DON’T USE PIE CHARTS U DUMB FUCKS

Quantitative Data

3 things to look at

  • Location (or “centre”) \(\rightarrow\) a value around which most of the data lies
  • Spread \(\rightarrow\) how variable the values are around the centre
  • Shape \(\rightarrow\) other information regarding the distribution of data

Numerical summaries

Sample mean: \[ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \]

Sample variance: \[ s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \]

Sample deviation: \[ s = \sqrt{s^2} \]

Sample median \[ \tilde{x}_{0.5} = \left\{ \begin{array}{l} x_{(\frac{n+1}{2})} \text{ if n is odd} \\ \frac{1}{2}(x_{(\frac{n}{2})} +x_{(\frac{n+2}{2})}) \text{ if n is even} \end{array} \right. \]

pth sample quantile:

\[ \tilde{x}_p = x_{(k)} \quad \text{where} \quad p = \frac{k-0.5}{n} \quad \text{for} \quad k \in \{1,2,3,\ldots,n\} \]

Inter-quartile Range: \[ IQR = \tilde{x}_{0.75} - \tilde{x}_{0.25} \]

Range based observations (IQR, median, ) are much less sensitive to outliers than other measures (mean, variance, sd)

Graphical summaries of quantitative data

Dotchart Boxplot, Histogram

Kernel density estimator: \[ \hat{f_h}(x) = \frac{1}{n} \sum_{i=1}^{n} w_h(x-x_i) \\ h \rightarrow \text{bandwidth parameter} \]

Shape of a distribution

Here are some sample distributions in 3 different skews: histogram shapes

It’s also worth checking for outliers that can influence the shape of the data

Summarising Associations Between Variables

correlation coefficient (2 quant vars):

\[ r = \frac{1}{n-1} \sum_{i=1}^{n} (\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y}) \ \] where \(\bar{x}\) and \(s_x\) are the sample mean and standard deviation of \(x\),similarly for \(y\).

3 Types of result:

  • \(|r| \leq 1\)
  • \(r = -1\)
  • \(r = 1\)

Where the second and third results are linear relationships between the two variables (negative and positive gradient)

2 measures:

  • Relationship strength \(\rightarrow\) how close \(r\) is to -1 or 1
  • Direction of association \(\rightarrow\) values less than one suggest a decreasing relationship, values greater than one suggest an increasing relationship

Associations between categorical and quantitative variables

Just use a comparative box plot smh

Transforming Data

Linear transformations

Linear Transformations take the shape of \[ y_i = a +bx_i \] for each i and b \(\neq\) 0

It doesn’t affect the shape of the distribution \(\rightarrow\) only the location and spread.

A common Linear transformation is the \(z\)-score or standardised score: \[ z = \frac{x-\bar{x}}{s_x} \]

It measures how many standard deviations above/below the value is from the mean (ie as \(|z| \rightarrow 1\)) the more unusual it is.

Nonlinear transformations

The most common Non-linear transformation is a log-transformation, it can reveal interesting relationships and structures for values that may seem too close together Log Transforms

Important Note: Let (y = h(x)) be some on linear transformation of real values x. In most cases: \[ \bar{y} \neq h(\bar{x}) \]

ie: the mean of the transform won’t be equal to the mean of the original data

Chapter 2: Random Variables

Introduction

  • A random variable is a variable that’s uncertain

  • Random Variables tend to occur within a sample space

Eg: \(S = \{HHH,HHT,HTH,THH,THT,HTT,TTT\}\)

  • The probability of a random variable is how often it occurs in the sample space

Eg: Let \(X\) be the number of heads. Then, \[ Pr(X = 0) = \frac{1}{8}, \quad Pr(X = 1) = \frac{3}{8}, \quad Pr(X = 2) = \frac{3}{8}, \quad Pr(X = 3) = \frac{1}{8}. \]

Discrete Random Variables and Probability Functions

A random variable is discrete if there are countably many values within the sample space ie: \(X \rightarrow Pr(X=x) >0 \)

The probability structure of the discrete random variable \(X\) is given by \[ f_X(x) = Pr(X=x) \]

It has the following properties: \[ f_X(x) \geq 0 \text{ for all } x \in \mathbb{R} \] and \[ \sum_{\text{all } x} f_X(x) = 1 \]

eg: the probabilities from the heads and tails example add to 1: \[ \sum Pr(X = x) = Pr(X = 0) + Pr(X = 1) + Pr(X = 2) + Pr(X = 3) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = 1 \]

Continous Random Variables and Density Functions

The analogue of a the probability function for continuous random variables is the density function

\[ \int_A f_X(x)dx = Pr(X \in A) \]

It has 2 similar properties:

\[ f_X(x) \geq 0 \text{ for all } x \in \mathbb{R} \]

and

\[ \int_{-\infty}^{\infty} f_X(x)dx = 1 \]

Therefore, for any continuous random variable \(X\) and a pair of numbers \(a \leq b\) we have

\[ Pr(a \leq X \leq b) = \int_{a}^{b} f_X(x)dx = \text{area under } f_X \text{ between } a \text{ and } b. \]

Hence, if you can derive \(f_X(x)\) you can derive any probability about \(X\) and hence any property of \(X\).

Continuous random variables \(X\) have the property \[ Pr(X = a) = 0 \text{ for any } a \in \mathbb{R} \]

Hence when we refer to them we use \(\leq, \geq\) as they are the same as \(\gt, \lt \).

Cumulative Distribution Function

The Cumulative Distribution Function (cdf) of the random variables \(X\) is \[ F_X(x) = Pr(X \leq x). \]

Eg coin toss example: \[ F_X(1) = \frac{1}{8} + \frac{3}{8} = 4/8 \] \[ F_X(2) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} = 7/8 \]

Hence: \[ Pr(a < X \leq b) = F_X(b) - F_X(a) \]

To calculate \(F_X(x)\) from \(f_X(x)\) can be done as:

\[ F_X(x) = \left\{ \begin{array}{l} \sum_{t \leq x} f_X(t) \quad \text{ if X is discrete} \\ \int_{-\infty}^{x}f_X(t)dt \quad \text{if X is continuous} \end{array} \right. \]

and vice-versa:

\[ f_X(x) = \left\{ \begin{array}{l} F_X(x - F_X(x\_) \quad \text{ if X is discrete} \\ F'_X(x) \quad \quad \quad \quad \quad \text{ if X is continuous} \end{array} \right. \] where \(F_X(x\_)\) is the limiting value of \(F_X(x)\) as we approach \(x\) from the negative direction.

Quantiles

A quantile is just a percentile.

If \(F_X\) is strictly increasing in some interval then \(F^{-1}_X\) is well defined and, for a specified \(p \in (0,1)\), the pth quantile of \(F_X\) is \(x_p\) where: \[ F_X(x_p) = p \text{ or } x_p = F^{-1}_X(p) \]

Expectation and Moments of Random Variables

Expected value / mean of a discrete random variable: \[ E(X) = \sum_{\text{all } x} x \times Pr(X = x) = \sum_{\text{all } x } xf_X(x) \] Expected value / mean of a continuous random variable: \[ E(X) = \int_{-\infty}^{\infty} xf_X(x)dx \]

Expectation of transformed random variables

\[ E\{g(X)\} = \left\{ \begin{array}{l} \sum_{\text{all } x} g(x) f_X(x) \quad \; \; \text{if X is discrete} \\ \int_{-\infty}^{\infty} g(x) f_X(x)dx \quad \text{ if X is continuous} \end{array} \right. \]

rth moment of \(X\) about \(a\) defined as \(E\{(X-a)^r\}\)

Expectation of a variable under transformation

\[ E(a+bX) = a+bE(X) \] for both continuous and discrete

Standard Deviation and Variance

\[ Var(X) = E\{(X-\mu)^2\} = E(X^2) - E(X)^2 \]

\[ sd = \sqrt{Var(X)} \]

Chebychev’s Inequality

Chebychev’s Inequality is a fundamental result concerning tail probabilities of general random variables. It is useful for derivation of convergence results given later.

\[ Pr(|X -\mu|) >k\sigma \leq \frac{1}{k^2} \]

where \(k > 0\) is a constant

It’s often stated as: “the probability that X is more than k standard deviations from its mean.”

Chebychev’s Inequality makes no assumptions about the distribution of X.

Deriving probability Distributions

In some cases you can derive the distribution from first principles

For continuous random variables, this means attempting to derive an expression for cumulative probabilities \(F_X(x)\), then \(f_X(x) = F'_X(x)\)

Transformation of Random Variables

Transformations of discrete random variables

For discrete X we have: \[ f_Y(y) = Pr(Y = y) = Pr(h(X) = y) = \sum_{x:h(x)=y} Pr(X = x) \]

Transformations of continuous random variables

For continuous random variable X, if h is monotonic over the set \(\{x : fX(x) > 0\}\), then

\[ f_Y(y) = f_X(x) |\frac{dx}{dy}| \\ \quad \quad \quad \qquad= f_X\{h^{-1}(y)\}|\frac{dx}{dy}| \] for \(y\) such that \(f_X\{h^{-1}(y)\} > 0\)

Chapter 3: Common Distributions

Summary

Distribution Type Parameters \(f_X(x)\) Domain \(E(X)\) \(Var(X)\) Uses
Bernoulli Discrete \(p\) \(p^{x}(1-p)^{1-x}\) \(\{0,1\}\) \(p\) \(p(1-p)\) A single trial with two possible outcomes (Bernoulli trial)
Binomial - \(Bin(n,p)\) Discrete \(n,p\) \(\binom{n}{x}p^{x}(1-p)^{n-x}\) \(\{0,1,2,\dots,n\}\) \(np\) \(np(1-p)\) Number of successes from n independent Bernoulli trials.
Geometric Discrete \(p\) \(p(1-p)^{x-1}\) \(\{1,\dots\}\) \(\frac{1}{p}\) \(\frac{1-p}{p^2}\) Number of independent Bernoulli trials until first success.
Hypergeometric Discrete \(n,m,N\) \(\frac{\binom{m}{x}\binom{N-m}{n-s}}{\binom{N}{n}}\) \(\{0,1,\dots,min(m,n)\}\) \(\frac{nm}{N}\) \(\frac{nm}{N}(1-\frac{m}{N})\frac{N-n}{N-1}\) Number of successes in a sample of size n from N items, of which m are successes.
Poisson Discrete \(\lambda\) \(\frac{e^{-\lambda}\lambda^{x}}{x!}\) \(\{0,1,2,\dots\}\) \(\lambda\) \(\lambda\) Counting independent events(that have constant occurrence probability).
Exponential Continuous \(\beta\) \(\frac{1}{\beta}e^{\frac{-x}{\beta}}\) \(x > 0\) \(\beta\) \(\beta^2\) Time between independent events (that have constant occurrence probability).
Uniform Continuous \(a,b\) \(\frac{1}{b-a}\) \(b > x > a\) \(\frac{a+b}{2}\) \(\frac{(b-a)^2}{12}\) An event with constant probability within some interval (a,b).
Normal - \(N(\mu,\sigma^2)\) Continuous \(\mu,\sigma^2\) \(\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) \(\infty > x > -\infty \) \(\mu\) \(\sigma^2\) Useful for some variables (e.g. height) but mostly for inference.
Gamma - \(Gamma(\alpha,\beta)\) Continuous \(\alpha,\beta\) \(\frac{e^{-x/\beta}x^{\alpha-1}}{\Gamma(\alpha)\beta^\alpha }\) \(x > 0\) \(\alpha\beta\) \(\alpha\beta^2\) Generalisation of the exponential.

Chapter 4: Bivariate Distributions

Observations are often taken in pairs \(\rightarrow\) one observation of two variables

Often we need to look at the relationship between 2 variables.

Joint Probability and Density Function

The probability that \(X = x \text{ and } Y = y\) \[ f_{X,Y}(x,y) = Pr(X=x,Y=y) \]

Solving Discrete Joint probability problems

  1. Tabulate the variable
  2. Work out the joint probabilites by multiplying them
  3. Read either the single cell or the addition of the row/col

Joint Density Functions

The joint density function of continuous random variables is a bivariate function with the property \[ \int \int_A f_{X,Y}(x,y)dxdy = Pr((X,Y) \in A) \]

for any subset \(A\) of \(\mathbb{R}^2\)

Solving Joint Density problems

  1. Figure out the area you want to integrate over

  2. Insert the limits into the intagrals of the joint density function

  3. Integrate with respect to x and Integrate with respect to y (using the limits)

  4. Solve

Alterntaitvly you could integrate with respect to x and y and then substitute the values into the new, integrated equation

Marginal Probability/Density Functions

Discrete

\[ f_X(x) = \sum_{\text{all } y} f_{X,Y}(x,y) \]

\[ f_Y(y) = \sum_{\text{all } x} f_{X,Y}(x,y) \]

Example:

marginal distribution example

Continuous

\[ f_X(x) = \int^\infty_{-\infty} f_{X,Y}(x,y) dy \]

\[ f_Y(y) = \int^\infty_{-\infty} f_{X,Y}(x,y) dx \]

Conditional Probability and Density Functions

Firstly Recall Bayes Rule: \[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(A \cap B)}{P(B)}, \text{ if } P(B) \neq 0 \]

The following applications are just applications of this rule to discrete and continous probabilites ### Discrete

\[ f_{X|Y}(x|y) = Pr(X = x | Y = y) = \frac{Pr(X = x , Y = y)}{Pr(Y=y)} = \frac{f_{X,Y}(x,y)}{f_Y(y)} \]

The oppisite is also true for \(f_{Y|X}(y|x)\).

\[ Pr(Y \in A |X = x) = \sum_{y \in A} f_{Y|X}(y|X = x) \]

Continous

\[ f_{X|Y}(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} \] The oppisite is also true for \(f_{Y|X}(y| X = x)\).

\[ Pr(a \leq Y \leq b | X = x) = \int_a^b f_{Y|X}(y|x) dy \]

Conditional Expected Value and Variance

Expected Value

The conditional expected value of \(X\) given \(Y = y\) is \[ E(X|Y = y) = \left\{ \begin{array}{l} \sum_{\text{all } x} xPr(X=x | Y = y)\quad \text{if X is discrete} \\ \int_{-\infty}^{\infty} xf_{X|Y}(x|y)dx \quad \quad \quad \quad \;\text{if X is continuous} \end{array} \right. \]

Similarly,

\[ E(Y|X = x) = \left\{ \begin{array}{l} \sum_{\text{all } x} yPr(Y = y | X = x)\quad \text{if Y is discrete} \\ \int_{-\infty}^{\infty} yf_{Y|X}(y|x)dx \quad \quad \quad \quad \;\text{if Y is continuous} \end{array} \right. \]

Variance

\[ Var(X|Y=y) = E(X^2 | Y=y) - \{E(X| Y=y)\}^2 \]

Where: \[ E(X^2|Y = y) = \left\{ \begin{array}{l} \sum_{\text{all } x} x^2Pr(X=x | Y = y)\quad \text{if X is discrete} \\ \int_{-\infty}^{\infty} x^2f_{X|Y}(x|y)dx \quad \quad \quad \quad \;\text{if X is continuous} \end{array} \right. \]

This also applies to \(Var (Y|X =x) \)

Independent Random Variables

A Varible is independent if and only if: \[ f_{Y|X}(y|x) = f_Y(y) \] or similarly

\[ f_{X|Y}(x|y) = f_X(x) \]

And hence

\[ F_{X,Y} = F_X(x) \times F_Y(y) \]

also hence

\[ E(XY) = E(X) \times E(Y) \\ \text{or more generally:} \\ E(g(X) \times h(Y)) = E\{g(X)\} \times E\{h(Y)\} \]

Covariance and Correlation

Covariance

\[ Cov(X,Y) = E\{(X-\mu_X)(Y-\mu_Y)\} \\ \text{where } \mu_X = E(X) \text{ and } \mu_Y = E(Y) \]

The covarince measures how \(X\) and \(Y\) vary together lineraly. If it’s \(> 0\) then \(X\) and \(Y\) are positively associated.(ie: They move along their axis the same way \(\rightarrow\) if \(X\) is big then \(Y\) will be big).

The inverse is true for \(< 0\) \(\rightarrow\) if \(X\) is small then \(Y\) will be big).

Here are 2 more results from the covariance:

\[ Cov(X,X) = Var(X) \] this one is kinda self explonatry

\[ Cov(X,Y) = E(XY) - \mu_X\mu_Y = E(XY) -E(X)E(Y) \]

X and Y will be independent if \(Cov(X,Y) = 0\)

Covariance also comes into play when finding bivariate variance transforms:

\[ Var(aX+bY) = a^2Var(X) + 2abCov(X,Y) + b^2Var(Y) \] Hence:

\[ Var(X+Y) = Var(X) + 2Cov(X,Y) + Var(Y) \]

If \(X\) and \(Y\) are independent: \[ Var(X+Y) = Var(X) + Var(Y) \]

\[ Var(X-Y) = Var(X) + Var(Y) \]

Correlation

\[ Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X) \times Var(Y)}} \]

The correlation measures the strength of the relationship between \(X\) and \(Y\).

If \(Corr(X,Y) = 0\) then the two varibles are uncorrelated.

Correlation will always lie between -1 and 1. As it moves away from zero the relationship becomes stronger. When Correlation reaches -1 or 1 the to variables are linearly correlated meaning that they can be expressed in the form \(Y = a+bX\).

The Bivariate Normal Distribution

BivariateNorm

Where \[ p = Corr(X,Y) \]

Below is a 3d bivariate normal distribution. Stolen from here

Todo: FINISH CHAPTER

Chapter 5: Survey Designs and Experiments

Introduction

The way we collect data can affect how we conduct our anaylsis. Data is basically never going to be exactly how we want it from the jump, so we often need to change (by sampling) how our data and conduct experiments to minimise data loss.

Survey Design

Representativeness

When collecting data we need to ensure that it’s representative and random. This is because when we want to make accurate predictions about the larger populations.

A sample is said to be representative if:

\[ f_{X_i}(x) = f_X(x) \text{ for each } i. \]

REPRESENTATIVENESS IS MORE IMPORTANT THAN SAMPLE SIZE. IT IS BETTER TO HAVE A SMALL BUT REPRESENTATIVE SAMPLE THAN A LARGE BUT UNREPRESENTATIVE SAMPLE.

Random Samples

A random sample of size \(n\) is a set of of random variables that are independent and have the same probability distribution.

A simple random sample is a method that samples without replacement in which every element of the sample space has an equally likly probability of being sampled.

In R this looks like

# 30 Values in the normal distribution 
x <- rnorm(30)

# sample 10 values from that distribution 
sample(x, 10)
##  [1] -0.2151128 -0.3356333  1.0256975  0.5306271  0.1870914  1.8907001
##  [7]  0.5010078  0.6518803  1.3503825 -1.0070247

Chapter 6: Distribution of Sums and Averages of Random Variables

Suppose that \(X\) and \(Y\) are independent Random Varibles (that are non negative) and let (Z = X +Y)
Then for the discrete case: \[ f_Z(z) = \sum_{y=0}^{z} f_X(x-y)f_X(y), \quad z=0,1,\dots \]

for the continous case \[ f_Z(z) = \int_{\text{all possible } y} f_X(x-y)f_X(y)dy \]

Moment Generating Functions

A moment generating function of a random variable \(X\) is:

\[ m_X(u) = E(e^{uX}) \]

In general: \[ E(X^r) = m^{(r)}_X (0) \text{ for } r = 0,1,2,\dots \] Where \(m^{(r)}_X\) is the \(r\)th deriative of \(m_X(u)\).

Moment generating functions of sums and averages

\[ m_{X+Y}(u) = m_X(u)m_Y(u) \]

or more generally \[ m_{\sum^{n}_{i = 1}}X_i(u) =\prod_{i=1}^{n}m_{X_{i}}(u) \]

and for an avg \[ m_{\sum^{n}_{i = 1}}X_i(u) =\prod_{i=1}^{n}m_{X_{i}}(\frac{u}{n}) \]

Central Limit Theorem

Chapter 7

Bias

\[ bias( \hat{\theta}) = E(\hat{\theta}) - \theta \]

Standard Error

\[ se(\hat{\theta}) = \sqrt{Var_{\hat{\theta}}(\hat{\theta})} \]

Mean Squared Error

\[ MSE(\hat{\theta}) = E\{(\hat{\theta} - \theta)^2\} \]

test test